housing price
RailEstate: An Interactive System for Metro Linked Property Trends
Chang, Chen-Wei, Cheng, Yu-Chieh, Tsai, Yun-En, Chen, Fanglan, Lu, Chang-Tien
Access to metro systems plays a critical role in shaping urban housing markets by enhancing neighborhood accessibility and driving property demand. We present RailEstate, a novel web based system that integrates spatial analytics, natural language interfaces, and interactive forecasting to analyze how proximity to metro stations influences residential property prices in the Washington metropolitan area. Unlike static mapping tools or generic listing platforms, RailEstate combines 25 years of historical housing data with transit infrastructure to support low latency geospatial queries, time series visualizations, and predictive modeling. Users can interactively explore ZIP code level price patterns, investigate long term trends, and forecast future housing values around any metro station. A key innovation is our natural language chatbot, which translates plain-English questions e.g., What is the highest price in Falls Church in the year 2000? into executable SQL over a spatial database. This unified and interactive platform empowers urban planners, investors, and residents to derive actionable insights from metro linked housing data without requiring technical expertise.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.17)
- North America > United States > Virginia > Alexandria County > Alexandria (0.06)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Banking & Finance > Real Estate (1.00)
- Transportation > Ground > Rail (0.94)
LLM-Powered CPI Prediction Inference with Online Text Time Series
Fan, Yingying, Lv, Jinchi, Sun, Ao, Wang, Yurou
Forecasting the Consumer Price Index (CPI) is an important yet challenging task in economics, where most existing approaches rely on low-frequency, survey-based data. With the recent advances of large language models (LLMs), there is growing potential to leverage high-frequency online text data for improved CPI prediction, an area still largely unexplored. This paper proposes LLM-CPI, an LLM-based approach for CPI prediction inference incorporating online text time series. We collect a large set of high-frequency online texts from a popularly used Chinese social network site and employ LLMs such as ChatGPT and the trained BERT models to construct continuous inflation labels for posts that are related to inflation. Online text embeddings are extracted via LDA and BERT. We develop a joint time series framework that combines monthly CPI data with LLM-generated daily CPI surrogates. The monthly model employs an ARX structure combining observed CPI data with text embeddings and macroeconomic variables, while the daily model uses a VARX structure built on LLM-generated CPI surrogates and text embeddings. We establish the asymptotic properties of the method and provide two forms of constructed prediction intervals. The finite-sample performance and practical advantages of LLM-CPI are demonstrated through both simulation and real data examples.
- Research Report > New Finding (1.00)
- Instructional Material > Online (1.00)
- Instructional Material > Course Syllabus & Notes (1.00)
Multimodal Machine Learning for Real Estate Appraisal: A Comprehensive Survey
Huang, Chenya, Li, Zhidong, Chen, Fang, Liang, Bin
Real estate appraisal has undergone a significant transition from manual to automated valuation and is entering a new phase of evolution. Leveraging comprehensive attention to various data sources, a novel approach to automated valuation, multimodal machine learning, has taken shape. This approach integrates multimodal data to deeply explore the diverse factors influencing housing prices. Furthermore, multimodal machine learning significantly outperforms single-modality or fewer-modality approaches in terms of prediction accuracy, with enhanced interpretability. However, systematic and comprehensive survey work on the application in the real estate domain is still lacking. In this survey, we aim to bridge this gap by reviewing the research efforts. We begin by reviewing the background of real estate appraisal and propose two research questions from the perspecve of performance and fusion aimed at improving the accuracy of appraisal results. Subsequently, we explain the concept of multimodal machine learning and provide a comprehensive classification and definition of modalities used in real estate appraisal for the first time. To ensure clarity, we explore works related to data and techniques, along with their evaluation methods, under the framework of these two research questions. Furthermore, specific application domains are summarized. Finally, we present insights into future research directions including multimodal complementarity, technology and modality contribution.
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > Spain > Valencian Community > Alicante Province > Alicante (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.68)
MONOPOLY: Learning to Price Public Facilities for Revaluing Private Properties with Large-Scale Urban Data
Fan, Miao, Huang, Jizhou, Zhuo, An, Li, Ying, Li, Ping, Wang, Haifeng
The value assessment of private properties is an attractive but challenging task which is widely concerned by a majority of people around the world. A prolonged topic among us is ``\textit{how much is my house worth?}''. To answer this question, most experienced agencies would like to price a property given the factors of its attributes as well as the demographics and the public facilities around it. However, no one knows the exact prices of these factors, especially the values of public facilities which may help assess private properties. In this paper, we introduce our newly launched project ``Monopoly'' (named after a classic board game) in which we propose a distributed approach for revaluing private properties by learning to price public facilities (such as hospitals etc.) with the large-scale urban data we have accumulated via Baidu Maps. To be specific, our method organizes many points of interest (POIs) into an undirected weighted graph and formulates multiple factors including the virtual prices of surrounding public facilities as adaptive variables to parallelly estimate the housing prices we know. Then the prices of both public facilities and private properties can be iteratively updated according to the loss of prediction until convergence. We have conducted extensive experiments with the large-scale urban data of several metropolises in China. Results show that our approach outperforms several mainstream methods with significant margins. Further insights from more in-depth discussions demonstrate that the ``Monopoly'' is an innovative application in the interdisciplinary field of business intelligence and urban computing, and it will be beneficial to tens of millions of our users for investments and to the governments for urban planning as well as taxation.
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
From Predictive Importance to Causality: Which Machine Learning Model Reflects Reality?
Arshad, Muhammad Arbab, Kandanur, Pallavi, Sonawani, Saurabh
This study analyzes the Ames Housing Dataset using CatBoost and LightGBM models to explore feature importance and causal relationships in housing price prediction. We examine the correlation between SHAP values and EconML predictions, achieving high accuracy in price forecasting. Our analysis reveals a moderate Spearman rank correlation of 0.48 between SHAP-based feature importance and causally significant features, highlighting the complexity of aligning predictive modeling with causal understanding in housing market analysis. Through extensive causal analysis, including heterogeneity exploration and policy tree interpretation, we provide insights into how specific features like porches impact housing prices across various scenarios. This work underscores the need for integrated approaches that combine predictive power with causal insights in real estate valuation, offering valuable guidance for stakeholders in the industry.
- North America > United States > Iowa > Story County > Ames (0.05)
- North America > United States > Florida > Volusia County > Daytona Beach (0.04)
ShapG: new feature importance method based on the Shapley value
Zhao, Chi, Liu, Jing, Parilina, Elena
With wide application of Artificial Intelligence (AI), it has become particularly important to make decisions of AI systems explainable and transparent. In this paper, we proposed a new Explainable Artificial Intelligence (XAI) method called ShapG (Explanations based on Shapley value for Graphs) for measuring feature importance. ShapG is a model-agnostic global explanation method. At the first stage, it defines an undirected graph based on the dataset, where nodes represent features and edges are added based on calculation of correlation coefficients between features. At the second stage, it calculates an approximated Shapley value by sampling the data taking into account this graph structure. The sampling approach of ShapG allows to calculate the importance of features efficiently, i.e. to reduce computational complexity. Comparison of ShapG with other existing XAI methods shows that it provides more accurate explanations for two examined datasets. We also compared other XAI methods developed based on cooperative game theory with ShapG in running time, and the results show that ShapG exhibits obvious advantages in its running time, which further proves efficiency of ShapG. In addition, extensive experiments demonstrate a wide range of applicability of the ShapG method for explaining complex models. We find ShapG an important tool in improving explainability and transparency of AI systems and believe it can be widely used in various fields.
- Europe > Russia (0.04)
- Asia > Russia (0.04)
- Asia > China > Shandong Province > Qingdao (0.04)
- (3 more...)
A Novel Paradigm for Neural Computation: X-Net with Learnable Neurons and Adaptable Structure
Li, Yanjie, Li, Weijun, Yu, Lina, Wu, Min, Liu, Jinyi, Li, Wenqiang, Hao, Meilan
Artificial neural networks (ANNs) have permeated various disciplinary domains, ranging from bioinformatics to financial analytics, where their application has become an indispensable facet of contemporary scientific research endeavors. However, the inherent limitations of traditional neural networks arise due to their relatively fixed network structures and activation functions. 1, The type of activation function is single and relatively fixed, which leads to poor "unit representation ability" of the network, and it is often used to solve simple problems with very complex networks; 2, the network structure is not adaptive, it is easy to cause network structure redundant or insufficient. To address the aforementioned issues, this study proposes a novel neural network called X-Net. By utilizing our designed Alternating Backpropagation mechanism, X-Net dynamically selects appropriate activation functions based on derivative information during training to enhance the network's representation capability for specific tasks. Simultaneously, it accurately adjusts the network structure at the neuron level to accommodate tasks of varying complexities and reduce computational costs. Through a series of experiments, we demonstrate the dual advantages of X-Net in terms of reducing model size and improving representation power. Specifically, in terms of the number of parameters, X-Net is only 3$\%$ of baselines on average, and only 1.4$\%$ under some tasks. In terms of representation ability, X-Net can achieve an average $R^2$=0.985 on the fitting task by only optimizing the activation function without introducing any parameters. Finally, we also tested the ability of X-Net to help scientific discovery on data from multiple disciplines such as society, energy, environment, and aerospace, and achieved concise and good results.
Covariate-distance Weighted Regression (CWR): A Case Study for Estimation of House Prices
Chu, Hone-Jay, Chen, Po-Hung, Chang, Sheng-Mao, Ali, Muhammad Zeeshan, Patra, Sumriti Ranjan
Geographically weighted regression (GWR) is a popular tool for modeling spatial heterogeneity in a regression model. However, the current weighting function used in GWR only considers the geographical distance, while the attribute similarity is totally ignored. In this study, we proposed a covariate weighting function that combines the geographical distance and attribute distance. The covariate-distance weighted regression (CWR) is the extension of GWR including geographical distance and attribute distance. House prices are affected by numerous factors, such as house age, floor area, and land use. Prediction model is used to help understand the characteristics of regional house prices. The CWR was used to understand the relationship between the house price and controlling factors. The CWR can consider the geological and attribute distances, and produce accurate estimates of house price that preserve the weight matrix for geological and attribute distance functions. Results show that the house attributes/conditions and the characteristics of the house, such as floor area and house age, might affect the house price. After factor selection, in which only house age and floor area of a building are considered, the RMSE of the CWR model can be improved by 2.9%-26.3% for skyscrapers when compared to the GWR. CWR can effectively reduce estimation errors from traditional spatial regression models and provide novel and feasible models for spatial estimation.
- Asia > Taiwan > Taiwan Province > Taipei (0.04)
- Asia > Singapore (0.04)
- Asia > China (0.04)
- (2 more...)
10 real-world Data Science Projects
Detecting fraudulent activities in transactions made by customers in financial institutions by analyzing the transactional data. Fraud detection in machine learning involves using advanced algorithms to analyze transactional data and identify patterns that indicate fraudulent activities. The algorithms learn from historical data to detect anomalies and flag suspicious transactions in real-time, preventing financial losses for businesses and protecting customers from identity theft. Machine learning techniques such as clustering, decision trees, and neural networks are commonly used in fraud detection models. The accuracy and efficiency of these models depend on the quality and quantity of the data used to train them, as well as the ability to continually update the models with new data to adapt to evolving fraud patterns.
- Law Enforcement & Public Safety > Fraud (0.75)
- Banking & Finance > Real Estate (0.53)
Does Noise Affect Housing Prices? A Case Study in the Urban Area of Thessaloniki
Kamtziridis, Georgios, Vrakas, Dimitris, Tsoumakas, Grigorios
Real estate markets depend on various methods to predict housing prices, including models that have been trained on datasets of residential or commercial properties. Most studies endeavor to create more accurate machine learning models by utilizing data such as basic property characteristics as well as urban features like distances from amenities and road accessibility. Even though environmental factors like noise pollution can potentially affect prices, the research around this topic is limited. One of the reasons is the lack of data. In this paper, we reconstruct and make publicly available a general purpose noise pollution dataset based on published studies conducted by the Hellenic Ministry of Environment and Energy for the city of Thessaloniki, Greece. Then, we train ensemble machine learning models, like XGBoost, on property data for different areas of Thessaloniki to investigate the way noise influences prices through interpretability evaluation techniques. Our study provides a new noise pollution dataset that not only demonstrates the impact noise has on housing prices, but also indicates that the influence of noise on prices significantly varies among different areas of the same city.
- Europe > Greece > Central Macedonia > Thessaloniki (0.84)
- Europe > Italy > Apulia > Bari (0.04)
- Asia > Taiwan (0.04)
- (11 more...)